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Abstract 

We present some new results which relate information to chaotic dy- 
namics. In our approach the quantity of information is measured by the 
Algorithmic Information Content (Kolmogorov complexity) or by a sort 
of computable version of it (Computable Information Content) in which 
the information is measured by the use of a suitable universal data com- 
pression algorithm. We apply these notions to the study of dynamical 
systems by considering the asymptotic behavior of the quantity of infor- 
mation necessary to describe their orbits. When a system is ergodic, this 
method provides an indicator which equals the Kolmogorov-Sinai entropy 
almost everywhere. Moreover, if the entropy is 0, our method gives new 
indicators which measure the unpredictability of the system and allows to 
classify various kind of weak chaos. Actually this is the main motivation of 
this work. The behaviour of a zero entropy dynamical system is far to be 
completely predictable exept that in particular cases. In fact there are 
entropy systems which exibit a sort of weak chaos where the information 
necessary to describe the orbit behavior increases with time more than 
logarithmically (periodic case) even if less than linearly (positive entropy 
case). Also, we believe that the above method is useful for the classifica- 
tion of zero entropy time series. To support this point of view, we show 
some theoretical and experimenthal results in specific cases. 
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1 Introduction 

In this paper, we present some results on the connections between information 
theory and dynamical systems. We analyse the asymptotic behavior of the 
quantity of information necessary to describe an orbit of a dynamical system 
with a given accuracy. This analysis gives some indicators of complexity of the 
orbit itself. 

These results have the following features and motivations: 

• the complexity indicators are defined for a single orbit and can be es- 
timated numerically; hence they can be used in simulations and in the 
analysis of experimental time series; 

• when the system is ergodic, the orbit complexity equals the Kolmogorov- 
Sinai entropy almost surely; thus our method provides a new characteri- 
zation of the entropy and an alternative way to compute it; 

• if the entropy is 0, the asymptotic behavior of the information provides a 
measure of the unpredictability of the system and allows to classify various 
kind of weak chaos. Actually this is the main motivation of this work. 

In recent papers (p0[,pq|,|17|],[|^) tools from algorithmic information theory 
have been used to define and study some indicators of orbit complexity. These 
indicators are invariant up to topological conjugacy. In same special cases the 
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theory allows to calculate them explicitly and gives a characterization of various 
kinds of 0-entropy dynamics. Moreover it has been proved that there are quan- 
titative relations between these indicators and the initial condition sensitivity 
of the system. This fact shows that information is strongly related to chaos 
even in the 0-entropy case. The approach of pfj makes use of the Algorithmic 
Information Content (Kolmogorov complexity) as measure of the information. 
Unfortunately the Algorithmic Information Content (AIC) is not a computable 
function (see section 2.3) and the related complexity indicators cannot be used 
in computer simulations nor in the analysis of experimental time series. 

The aim of this paper is to overcome these difficulties defining orbit com- 
plexity indicators which are suitable for computer experiments. The main idea 
consists in replacing the AIC by a Computable Information Content (CIC) which 
is measured using suitable compression algorithms. We prove theorems which 
support the use of this method in the experimental setting. In particular, we 
prove that our method gives the same asymptotic behaviour of the quantity of 
information when it is measured with AIC in two important cases: the posi- 
tive entropy case and the Manneville map which is an paradigmatic example of 
intermittent dynamical systems. 

Moreover, even when it is not possible to give theoretical estimates, we have 
performed some numerical experiments to investigate how the method works in 
practice. 

The paper is organised as follows. 

In Section 2, we recall the main notions of information content for finite 
strings and introduce the notion of Computable Information Content whose 
definition is based on Compression Algorithms. 

In Section 3, we consider infinite strings and we define their complexity as 
the time average information content; we prove that the complexity of almost 
every string generated by an ergodic information source equals the Shannon 
entropy of the source itself. 

In Section 4, we consider dynamical systems and, via the symbolic dynamics 
method, we apply the results of the previous sections. In particolar, we prove 
that the complexity of almost every orbit equals the Kolmogorov-Sinai entropy, 
provided that the system is ergodic. 

In Section 5, we consider the 0-entropy case (weak chaos) and we intro- 
duce some indicators which are able to detect different kinds of weakly chaotic 
dynamics. 

In Section 6, we analyse two compression algoritms (LZ77 and CAStoRe) 
and we prove some theorem reltive to them which provide a bridge between 
the abstract theory and concrete computctions. In particolar we prove that in 
the case of the Manneville map the CIC (based on LZ77) provides the same 
asymptotic beheaviour than the AIC. 

In the final Section (Section 7), we show the results of some numerical exper- 
iment. In the case in which the theory is able to estimate the various invariants, 
our empirical results agree with the theoretical ones. In the other cases, we just 
show how our method can be applied and we obtain also an empirical result 
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which, as far as we know, does not have any theoretical explanation (Casati- 
Prosen map, § 7.4). 



2 Entropy, information and complexity of finite 
strings 

The intuitive meaning of quantity of information I(s) contained in s is the 
following one: 

I(s) is the length of the smallest binary message from which you can 

reconstruct s. 

Thus, formally 

I : A* -> N 

/ is a function from the set of finite strings on a finite alphabet A which takes 
values in the set of natural numbers. There are different notions of information 
and some of them will be discussed here. The first one is due to Shannon. 

In his pioneering work, Shannon defined the quantity of information as a 
statistical notion using the tools of probability theory. Thus in Shannon frame- 
work, the quantity of information which is contained in a string depends on 
its context (p4j). For example the string 'pane' contains a certain information 
when it is considered as a string coming from a given language. For example 
this word contains a certain amount of information in English; the same string 
'pane' contains much less Shannon information when it is considered as a string 
coming from the Italian language because it is much more common (in fact it 
means "bread"). Roughly speaking, the Shannon information of a string s is 
given by 

/(s)=log 2 4v W 
p(s) 

where p(s) denotes the probability of s in a given context. The logarithm is 
taken in base two so that the information can be measured in binary digits 
(bits).Q 

If in a language the occurrences of the letters are independent of each other, 
the information carried by each letter is given by 

I(a t ) = log— . 

Pi 

where pi is the probability of the letter a^. Then the average information of each 
letter is given by 

h = Y^ Pl \og-. (2) 
^-r 1 Pi 



Prom now on, we will use the symbol "log" just for the base 2 logarithm "log 2 " and we 
will denote the natural logarithm by " In " . 
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Shannon called the quantity h entropy for its formal similarity with Boltzmann's 
entropy. 

We are interested in giving a definition of quantity of information of a single 
string independent of the context and of any probability measure. Of course we 
will require this definition to be strictly related to the Shannon entropy when 
we equip the space of all the strings with a suitable probability measure. 

In order to be more precise it is necessary to give some notations and defi- 
nitions. 

Let us consider a finite alphabet A and the set A* of finite strings on A, 
that is .4* ={J™ =1 A n . 
Now let 

F : A* -► {0, 1}* 
be an injective function, and set 

I F (s) = \F(s)\ 

K F {s) = — 

where \s\ is the length of the string s. 

Let us consider the usual shift map a : A N — > A N defined by 

= ■ 

For a probability measure /i on _A N , which is invariant with respect to the shift, 
we denote by h(fj.) the well-known Shannon entropy of the measure. 

Given a string u> E A N , we will denote by w™ € A n the string which consists 
of the first n digits of u>. 

Now we can give the following definition of information and complexity 

Definition 1 (Information measure). If for any ergodic measure fi on A N 
we have that for /x-almost every 10 € „4 N 

limsupir F (iy l ) = h(fi) , (3) 

n — >+oo 

then, 

• F is called ideal coding 

• Ip (s) is called information content of s (with respect to F) 

• Kp(s) is called complexity (or compression ratio) of s (with respect to F). 

Later we will see that ideal codings exist; by condition ([}]) they are asymp- 
totically equivalent to each other. 

This definition is given without assuming recursivity for F. Later on, when 
we consider F as a coding procedure, we will mean that F is a recursive function. 

In the following we will also see that choosing F in a suitable way, it is 
possible to investigate interesting properties of dynamical systems with null 
Kolmogorov-Sinai entropy. 
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2.1 Empirical entropy 



The empirical entropy is a quantity that can be thought to be in the middle 
between Shannon entropy and the pointwise definition of complexity. The em- 
pirical entropy of a given string is a sequence of numbers Hi giving statistical 
measures of the average information content of the digits of the string s. 

Let s be a finite string of length n. We now define Hi(s), I > 1, the I 
empirical entropy of s. We first introduce the empirical frequencies of a word in 
the string s: let us consider w G A 1 , a string on the alphabet A with length I; 
let g( mi '" 12 ) g A m2 ~ mi be the string containing the segment of s starting from 
the mi-th digit up to the W2-th digit; let 

5(s^ l \w) = t I ifs^+»=w (0<i<n _ _ 
10 otherwise 

The relative frequency of w (the number of occurrences of the word w divided 
by the total number of i-digit sub words) in s is then 

n—l 



1 " 

P(s, w) = — - V , w) 

n — I + 1 ^— ' 

i=0 



This can be interpreted as the "empirical" probability of w relative to the 
string s. Then the Z-empirical entropy is defined by 

Hi( s ) = ~\ E P(s,w)logP(s,w). (4) 

weA 1 

The quantity IHi(s) is a statistical measure of the average information con- 
tent of the Z — digit long substrings of s. 



2.2 Computable Information Content 

Let us suppose to have some recursive lossless (reversible) coding procedure 
Z : A* — > {0, 1}* (for example, the data compression algorithms that are in any 
personal computer). Since the coded string contains all the information that is 
necessary to reconstruct the original string, we can consider the length of the 
coded string as an approximate measure of the quantity of information that is 
contained in the original string. 

If Z is an ideal coding (according to defintion [l] then, as before, the infor- 
mation content of s with respect to Z is defined as Iz(s) = \Z(s)\. 

Of course not all the coding procedures are equivalent and give the same 
performances, so some care is necessary in the definition of information content. 
For this reason we introduce the notion of optimality of an algorithm Z, defined 
by comparing its compression ratio with the empirical entropy. 

An algorithm Z is considered optimal if its compression ratio |Z(s)|/|s| is 
asymptotically less than or equal to H k (s) for each k. 
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Definition 2. (Optimality) A reversible coding algorithm Z is optimal if 
Vfc € N there is a function with fk(n) = o(n), such that for all finite strings 

Many data compression algorithms that are used in applications are proved 
to be optimal. 



Remark 3. The universal coding algorithms LZ11 and LZ78 (\3^], \3H\] ) satisfy 
Definition ^. For the proof see W^J . 

Using the definition above, we are able to define the Computable Information 
Content of a string: 

Definition 4. The Computable Information Content of a string s is an infor- 
mation measure (in the sense of Dcf. |l|) where the ideal coding F is an optimal 
compression algorithm. 



The notion of optimality is not enough if we ask a coding algorithm to be able 
to reproduce the rate of convergence of the sequence Hk(s) as \s\ — -> oo for strings 
generated by weakly chaotic dynamical systems, for which limi s i_> 00 Hk(s) = 0. 
Indeed, if in the positive entropy systems optimality implies that asymptotically 
I is equivalent to Hk(s), in the weakly chaotic systems it may happen that 
the asymptotic behavior dominant in the right hand side of equation (||) is that 
of the function ^4^-. 

For example let us consider the string n l and the LZ78 algorithm, then 
iT fc (0 n l) goes like log{n)/n while LZ78(0"l)/n goes like " 1/2 ' w og(Tl) (see also §). 
This implies that optimality is not sufficient to have a coding algorithm able 
to characterize 0— entropy strings according to the rate of convergence of their 
entropy to 0. For this aim we need an algorithm having the same asymptotic 
behavior of the empirical entropy. In this way even in the 0-entropy case our 
algorithm will provide a meaningful measure of the information. The follow- 
ing definition (from p6fl ) is an approach to define optimality of a compression 
algorithm for the 0-entropy case. 

Definition 5 (Asymptotic Optimality). A compression algorithm Z is called 
asymptotically optimal with respect to Hk if it is optimal and there is a function 
gk with gk(n) = o(n) and A > such that Vs with Hk(s) ^ 

\Z(s)\<X\s\H k (s)+g k (\Z(s)\). 

It is not trivial to construct an asymptotically optimal algorithm. For in- 
stance the well known Lempel-Ziv compression algorithms are not asymptot- 
ically optimal. LZ78 is not asymptotically optimal even with respect to Hi 
(|2(|). In [^6| some examples are described of algorithms (LZ78 with RLE and 
LZ77) which are asymptotically optimal with respect to Hi. But these exam- 
ples are not asymptotically optimal for each H k with k > 2. The asymptotic 
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optimality of LZ77 with respect to H\ (Theorem |3l|) however is enough to prove 



(see Section 6.2, Theorem |34|) that LZ77 can estimate correctly the information 
coming from an important example of weak chaos: the Manneville map. 

The set of asymptotically optimal compression algorithms with respect to 
each .fffc is not empty. In p7| an example is given of a compression algorithm 
that is asymptotically optimal for each Hk ■ The algorithm is similar to the Kol- 
mogorov frequency coding algorithm which is also used in Wi . This compression 
algorithm is not of practical use because of its computational complexity. 

To our knowledge the problem of finding a fast asymptotically optimal com- 
pression algorithm is still open. 

2.3 Algorithmic Information Content 

One of the most important information function is the Algorithmic Information 
Content (AIC). In order to define it, it is necessary to define the notion of 
partial recursive function. We limit ourselves to give an intuitive idea which is 
very close to the formal definition. We can consider a partial recursive function 
as a computer C which takes a program P (namely a binary string) as an input, 
performs some computations and gives a string s = C(P), written in the given 
alphabet A, as an output. The AIC of a string s is defined as the shortest 
binary program P which gives s as its output, namely 

AIC{s, C) = min{|P| : C{P) = s} 

We require that our computer is a universal computing machine. Roughly 
speaking, a computing machine is called universal if it can simulate any other 
machine. In particular every real computer is a universal computing machine, 
provided that we assume that it has virtually infinite memory. For a precise 
definition see e.g. Q or |hJ. We have the following theorem due to Kolmogorov 

(MM)- 

Theorem 6. If C and C are universal computing machines then 

\AIC{s, C) - AIC(s, C')\<K (C, C) 
where K (C, C) is a constant which depends only on C and C but not on s. 

This theorem implies that the information content AIC of s with respect 
to C depends only on s up to a fixed constant, then its asymptotic behavior 
does not depend on the choice of C. For this reason from now on we will write 
AIC(s) instead of AIC(s, C). The shortest program which gives a string as its 
output is a sort of encoding of the string. The information which is necessary 
to reconstruct the string is contained in the program. 



We have the following result (for a proof see for example |19 Lemma 6) : 
Theorem 7. let 

Z C •• A* -> {0,1}* 
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be the function which associates to a string s the shortest program whose output 
is s itself (namely, AIC(s) = Iz c (s)). If Z is any reversible coding, there 
exists a constant M which depends only on C such that 

\Z c (s)\<\Z(s)\+M (5) 

The inequality (|5|) says that Zc in some sense is optimal. Unfortunately this 
coding procedure cannot be performed by any algorithm (Chaitin Theorem)]^. 
This is a very deep statement and, in some sense, it is equivalent to the Turing 
halting problem or to the Godel incompleteness theorem. Then the Algorithmic 
Information Content is not computable by any algorithm. 

This fact has very deep consequences for our discussion as we will see later. 
For the moment we can say that the AIC cannot be used as a reasonable physical 
quantity since it cannot be measured, however it is very useful in proving general 
theorems. 



3 Information sources 

3.1 Infinite strings and complexity 

A symbolic dynamical system is given by (fi,C, /i, a). The space Q is the space 
A N of the infinite sequences co = (cij;)i e N of symbols in A. C is the c-algebra 
generated by the cylinders^, 

C(w (fc < n) ) = {ZZJ G : Wi = u)i for k < i < n - 1}, 

where a/ fe ' n ) = (u)i)k<i< n -i = {^k,^>k+i, ■ ■ ■ ,^n-i), the map a is the shift map 

o~((uJi)ieN) — (^i+i)ieN 

and /j, is a a- invariant probability measure on £1. A symbolic dynamical system 
can be also viewed as an information source. For the purposes of this work the 
two notions can be considered equivalent. 

We give now different measures of complexity for infinite strings generated 
by the symbolic dynamical system according to Definition |l|, using the different 
information measures defined above. However each definition of complexity of 
an infinite string uj can be thought of as a measure of the average quantity of 
information which is contained in a single digit of u. 

2 It two programs of the same length produce the same string, we choose the program which 
comes first in lexicographic order. 

3 Actually, the Chaitin theorem states a weaker statement: a procedure (computer program) 
which states that a string a of length n can be produced by a program shorter than n, must 
be longer than n. 

4 We remark that C corresponds to the Borel cr-algebra when Q is equipped with the product 

topology, that is the topology induced by the metric d(u},U) = JZigN "2*"*'* ' wnere ^('1 ') 
is the Kronecker delta. 
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Definition 8 (Complexity of infinite strings). If uj e ft, Z : A* — ► {0, 1}* 

is a reversible universal coding procedure we define the computable complexity 
of uj with respect to Z as 

Kz(co) =limsup Kz(uj n ) , 

n — >oo 

where u> n = uj(°- n K In the same way, using the AIC, we define 

A (uj) =hmsup . 

n — >oo IT 

We also define the quantity H(uj). If w is an infinite string, H(uj) is a sort 
of Shannon entropy of the single string. 

Definition 9. By the definition of empirical entropy of finite strings we define: 

Hi(lu) =limsup Hi{uj n ) 

n — >oo 

and 

H(uj) =lim Hi(uj). 

I — >oo 

The existence of this limit is proved in |32). The following proposition is a 
direct consequence of ergodicity (for the proof see again J32| ). 

Proposition 10. If (f2, /i, a) is ergodic then H(uo) — h^{a) (where is the 
Kolmogorov-Sinai entropy of a) for [i-almost each uj. 

Moreover from the definition of optimality it directly follows that: 

Remark 11. If Z is optimal then for each uj and for all I 

K z (uj) < Hi{u)) , 

so that 

K z (uj) < H(uj) . 

Remark 12. As it is intuitive, the compression ratio of Z cannot be less than 
the average information per digit as it is measured by the algorithmic informa- 
tion content (Theorem fy, thus for all uj, we have 

K z (uj) > K(uj) . 

This remark and the following Lemma are useful for the proof of the next 
theorem 

Lemma 13 (Brudno [0]). If [i is ergodic then K(uj) = h fl (a) for almost each 

UJ. 

Then we have the following 
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Theorem 14. If (Q, fj,, a) is a symbolic dynamical system, Z is optimal and \x 
is ergodic, then for (i-almost each u> 

K z {u) = H[u) = K(u) = h ll {a) , 

in particular optimality implies that the algorithm is an ideal coding and Iz and 
AIC are information measures (see Definition^). 

Proof. H(uj) = K(uj) = h^(a) for almost each lu using Proposition |o| and 
Brudno's Lemma above. Moreover we have that Kz{ui) > K{ui) (Remark |l2|) 
and then Kz{lo) > h^(o-) for /i-almost each lu. 

On the other hand, Kz(oj) < H{lu) (Remark |ll|) and then Kz(lu) = h^{a) 
for almost each lu. □ 

This theorem shows that all the various information measures we have de- 
fined in section 2 agrees when we study the long time asymptotical behavior 
of the information necessary to describe a generic orbit of a positive entropy 
source. 

If the measure [i is not ergodic we can replace the a.e. above result with an 
average result: the average complexity is equal to the entropy. 

Theorem 15. Let (Q,C,cr) be a symbolic dynamical system, with a a-invariant 
probability measure fi. Then if Z is optimal, 



Jn Jn Jn 

Proof. First of all, we show that all the quantities to be integrated are actually 
measurable. We show how to prove measurability for K{u>). The argument 
applies unchanged to the others (one more limit has to be considered for H). 
This argument is due to Brudno (@). 

For any t e K, let T = {lu e £1 / K(lu) < t}. The set T can be written as 

oo oo 

T = U U fl ^ I AIC(u n ) < n(t - 1/fc)}, 

k=l N=l n>N 

and since all the sets in curly brackets are finite union of cylinders, measurability 
of the set T and of K(cu) follows from classical theorems of measure theory. 

To obtain the thesis of the theorem, we use the ergodic decomposition theorem 
and its application to Kolmogorov-Sinai entropy h^(a) (see Katok-Hasselblatt, 
chapter 4, J23|). Let (flj,Cj, Hj)jeJ be an ergodic decomposition of (0,C,cr), 
that is are invariant subsets of f2, fij are ergodic measures with support on 
flj, and J is a Lebesgue space with probability measure P. Then we have that 



/ K(u)dii =1(1 K{uo) dfiA dP= [ h N (o) dP = h^a) 
Jn J j \Jiij J J. j 



The first and last equalities come from the ergodic decomposition theorem, 
and the second one from Theorem The same argument applies to Kz(u)), 
H(lu). □ 
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4 Dynamical systems 



In this Section we apply the features of coding algorithms and the results of the 
previous section to define a notion of complexity for orbits of dynamical systems 
and prove some relations with the Kolmogorov-Sinai entropy. 

The relations we can prove will be useful as a theoretical support for the 
interpretation of the experimental and numerical results. The results which we 
will explain in this section are meaningful in the positive entropy case. The null 
entropy cases are harder to deal with, and we present some results in the next 
section. 

4.1 Dynamical systems and partitions 

Now we consider a dynamical system (X, fi, T), where X is a compact metric 
space, T is a continuous map T : X — » X and /i is a Borel probability measure 
on X invariant for T. If a = {Ai, . . . , A n } is a measurable partition of X (a 
partition of X where the sets are measurable) then we can associate to (X, /i, T) 
a symbolic dynamical system (Q a ,fJ, a ) ( called a symbolic model of (X, T)). By 
this association many results about symbolic dynamical systems will be trans- 
lated to dynamical systems over metric spaces where the choice of a partition 
has been made. 

The set f2 a is a subset of {1, ... , n} N (the space of infinite strings made of 
symbols from the alphabet {1, . . . , n}). To a point x € X it is associated a 
string uj = (wj)jgN = <Pa{x) defined as 

<p a (x) = uj Vj e N, T j (x) e A Uj . 

Since a is a partition the set ip a {x) will contain only one element and defines 
a function associating an infinite string to a point x G X. The measure y, on 
X induces a measure \i a on the associated symbolic dynamical system. The 
measure is first defined on the cylinders^] 

= {uj € Q a : u3i = Ui for k < i < n - 1} 

by 

M a (^ (M '))=M(nr 1 r- i (i Wj )) 

and then extended by the classical Kolmogorov theorem about product measures 
to a measure fi a on Q a . Moreover if (X,fj,,T) is ergodic then (Sl a , fj, a , a) is 
ergodic and h^T, a) (the Kolmogorov-Sinai entropy relative to the partition a) 
on X equals h^ a (a) on fl a (see also ) . 

We now define the complexity of an orbit with respect to a partition. The 
above considerations will allow us to apply the results on symbolic dynamical 
systems to general dynamical systems with a partition. 

5 We recall that w( fc - n ) = (u>i)fc<j< n = (w k ,ui k+1 ,. . .,ui n ). 
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Definition 16. Let ui = (f a (x) for a given partition a. We define the complexity 
of the orbit of a point i£l, with respect to the partition a, as 

AIC(x,a,n) = AIC{uo n ) 
K(x,a) = K(w), 
where the information is measured by the AIC. We also define 

Iz(x, a, n) = I z (uj n ) 
K z (x,a) = K z (uj), 

where the information is measured by Z. Also the definition of empirical entropy 
can be extended for (x, a), defining 

H(x,a) = H{lo) . 

Theorem 17. If Z is an optimal coding, and (X, /i,T) is an ergodic dynamical 
system and a is a measurable partition of X, then for fi-almost all x 

K z {x,a) = hfj_(T, a) 

where h[i(T,a) is the Kolmogorov entropy of (X, n,T) with respect to the mea- 
surable partition a. 

The proof of the above Theorem follows easily from the following lemmas. 

Lemma 18. If (X, T, fi) is ergodic, then for almost each point x € X , K{x, a) = 
h M (T,a). 

This Lemma was already proved by Brudno. See Lemma 2.6 page 137. 
Lemma 19. If (X, T, fi) is ergodic, for almost each point x G X 

H(x,a) = h^(T,a). 

Proof. In the associated symbolic system h^(T,a) — h fJia (a). Moreover, for 
almost each co € fl a , it holds H(x,a) — H(ui) = h^ a {a) where x — fa^ 1 ^) 
(Prop. |l0|). If we consider Qn a '■= {lo *E tt a : H(lo) — h^ a (a)} and Q :— 
<^a 1 (Qn Q ) we have 

Mx e Q H(x,a) = H(ip a (x)) = h Pia (a) = h^(T,a) . 

According to the way in which the measure /x Q is constructed we have fJ-(Q) = 

PaiQoJ = 1- □ 

Proof of Theorem [T^J. The proof of Theorem [l7] follows as before from the 
remark that K (x, a) < K z {x, a) < H(x, a) (Remarks |ll] and ^2|) and Lemmata 
|L|and|l9|n 

As before we show the corresponding result in the non ergodic case 
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Theorem 20. If for the dynamical system (X, T, /i) the measure \x is only T- 
invariant, then, if Z is an optimal compression algorithm, for any measurable 
partition a it holds 

hjj,(T, a) = I Kz{x,a) dfi = H(x,a) dfi = K(x,a) dfi . 
Jx Jx Jx 

Proof. The proof follows that of Theorem [l5], using the definition of the com- 
plexity of infinite orbits of a dynamical system through the complexity of the 
associated infinite symbolic orbit, and previous lemmata. The measurabil- 
ity of the partition a is essential to obtain the measurability of the function 
if a : X — > il a . □ 

Corollary 21. Under the assumption of the previous Theorem if moreover 
a is a generating partition 



(T) = / Kz{x,a) dfi = / H(x,a) dfi — K(x,a) dfi 
Jx Jx Jx 



Remarks. This theorem shows that all the various information measures 
we have defined in section 2 agrees when we study the long time asymptotical 
behavior of the information necessary to describe a generic orbit of a positive 
entropy system. Theorem ^ shows that if a system has an invariant measure, 
its entropy with respect to a given partition can be found by averaging the 
complexity of its orbits over the invariant measure. Then, the entropy may 
be alternatively defined as the average orbit complexity. However if we fix a 
single point, its orbit complexity is not yet well defined because it depends on 
the choice of a partition. It is not possible to get rid of this dependence by 
taking the supremum over all partitions (as in the construction of Kolmogorov- 
Sinai entropy), because this supremum goes to infinity for each orbit that is not 
eventually periodic (see Assertion 2.8). 

We sketch how this difficulty may be overcome in two ways: 

1) by considering open covers instead of partitions as in Q and in ]l9| . 
We recall that since the sets in an open cover can have non empty intersection, a 
step of the orbit of x can be contained at the same time in more than one open set 
of the cover. This implies that an orbit may have an infinite family of possible 
symbolic codings, among which we choose the "simplest one". Then we can 
define the complexity of the orbit of a point as the supremum of the complexities 
obtained with respect to all possible open covers. This definition has the very 
nice property to be invariant up to topological equivalence of dynamical systems. 
This definition of orbit complexity equals the entropy for almost each point of 
a compact ergodic system. 

2) by considering only a particular class of partitions and define the orbit 
complexity of a point as the supremum of the orbit complexity over that class. 
This can be easily done if the space is R n by considering partitions generated 
by intersections of half spaces with rational coordinates (polyedric partitions). 
By the following Lemma it easily follows that the corresponding notion of 
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orbit complexity equals the entropy for almost each point of a compact ergodic 
system. 



Let Pi be a family of measurable partitions such that lim diam(Pi) = 0. If 

i — >oo 

we consider lim sup Kz(x,Pi) we have the following 

i — >oo 

Lemma 22. If (X, (i,T) is compact and ergodic, Z is optimal, then for fi-almost 
all points x G X, lim sup Kz(x,Pi) =limsup K(x,Pi) = h^iT). 

i — >oo i — >oo 

Proof. The points for which Kz(x,f3i) h^T, /%) are a set of null measure for 
each i (Theorem . When excluding all these points, we exclude (for each i) 
a zero-measure set. For all the other points we have Kz(x, Pi) = h^(T, Pi) and 
then lim sup Kz{x, Pi) = lim sup h^(T, Pi). Since X is compact and the diameter 

i — »oo i — >oo 

of the partitions Pi tends to 0, we have that lim sup h^(T,Pi) = h^{T) (see e.g. 

i — >oo 

[p3| page 170). The same arguments holds for K(x,Pi), and the statement is 
proved. □ 

The previous lemma makes possible the following definition. If (X,/j,,T) is 
compact and ergodic and Z is optimal, then for /z-almost all points x G X and for 
countable families of measurable partitions {Pi}i^n such that lim diam(Pi) = 

i — >oo 

0, the complexity of the orbit of a point x E X is 

K z {x) =limsup K z (x,Pi) . 



5 Weakly chaotic dynamical systems 

A weakly chaotic dynamical system is a system whose all physically relevant 
invariant measures have null Kolmogorov- Sinai entropy, but it has a not ordered 
dynamics. Thus, the complexity defined in Def. [I] always gives a null value and 
it is not a good observable to characterize these systems. 

The first thing to do to have a meaningful observable would be to look di- 
rectly at the asymptotic behavior of the information necessary to describe the 
orbit of a point]^]. One of the main tools in the proof of Brudno's main theorem, 
which states the equality between the complexity for almost any initial condi- 
tion and the Kolmogorov-Sinai entropy, is the Birkhoff ergodic theorem, which 
gives a relation between spatial and temporal averages for measurable functions 
defined on the state space. Then our pointwise approach to the asymptotic 
behavior of the complexity corresponds to the pointwise results of the ergodic 
theorem. 

From this point of view, the pointwise approach in weakly chaotic dynamical 
systems should be based on general ergodic theorems, in which the temporal 

6 We recall that we do this with respect to a fixed partition (see the remarks of the previous 
section) 
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average should be done with non linear weights. This is, to our knowledge, a very 
delicate point in ergodic theory, and actually there are some negative results, for 
example in case of dynamical systems defined on a space X with an invariant 
measure \x such that fJ.(X) = +00. Let (X, T, /x) be such a dynamical system; 
it is impossible to define a sequence {a(n)} of integer numbers, monotonically 
converging to infinity and with — » as n — > 00, such that for all functions 

/ /. : ;.Y./m 

n— 1 

n— >+oo a n * — ' 
v ' i=o 

for almost any x € X, where C is a positive finite constant (PJ). This result is 
applicable, for example, to the family of Manncville maps with parameter z > 2 



(see Sections |6.2j and 7.1 for the description of the maps and references |2l| , 
[pO|,|p|) where the physically relevant invariant measure is infinite. Hence this 
is an indication that a pointwise approach for the complexity of the orbits of 
the Manneville maps could not give a consistent result. 

We remark that we are just looking at the behavior of ergodic averages of a 
single function, so the generality of the ergodic theorems could be too much for 
our aims. Nevertheless using the results of || and Jlj| for the Manneville map 
with z = 2, we expect that for almost any point x £ X it is impossible to find a 
sequence a(n) of integer numbers, converging to infinity and with — * as 
n — > 00, such that the limit 

AIC(u n ) 

lim — r~r~ 

n—>oo a(n) 

exists and it is strictly positive, where ui n denotes the first n digits of the 
symbolic string associated to the point x using a fixed partition. Moreover we 
expect the superior limit to be infinity and the inferior limit to be zero for almost 
any initial condition, when the two limits are not both either zero or infinity. 
We believe that this result can be extended to the cases z > 2. 

Hence, for this reason, we will suggest a slight modification of Definition [l] 
and we will show that in the case of the Manneville-type maps this new index 
gives a classification for the maps of the family. 

First, we will sketch the landscape in the case of general symbolic dynamical 
systems. 



5.1 Symbolic dynamical systems 

The following definitions are inspired by the example of the Manneville maps 
with z > 2, for which there is not a physically relevant invariant probability 
measure, but for which there are results about the Lebesgue measure which is 
physically relevant but not invariant. 

Hence in the following we will consider dynamical systems with a not nec- 
essarily invariant reference probability measure \i. Let (fi, a) be a dynamical 
system and assume that there is a physically relevant measure fi which is not 
necessarily invariant. For instance, if the space X is the unit interval [0,1], 
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then we will consider fi to be the Lebesgue measure on [0,1]. We consider the 
following index: 

Definition 23 (q-entropy). Let I : A* — > N be an information measure. Let 
q be a positive real number. We call q-entropy 

h q (n) = limsup f dn . (6) 

If 7 = AIC, then we denote /i 9 (f2) with /i^ JC (f2). If I = Iz, with Z a recursive 
coding procedure, then we denote h q (£l) with h q z (fl). 

Theorem 24. For all recursive coding procedures Z and all q > 0, we have 

h%(Q) > h q AIC (n) . 

Proof. From inequality (|^), we have that there exists a constant M not depend- 
ing on Z such that 

AIC(w n ) < I z (uJ n ) + M 
n i ~ ni n q 
From this the theorem easily follows. □ 

Definition 25 (Chaos index). We call chaos index of the symbolic dynamical 
system {ft, C,[i,a) the number q{Q) = inf{p > | hP(tt) = 0} G [0,1]. The 
indexes q AIC and q z are defined as above. 

Corollary 26. For all recursive coding procedures Z , 

q z {Q) > q AIO (Q) . 

5.2 General dynamical systems 

Let (X, T) be a dynamical system amnd let \x be a reference probability measure 
as above, which is not supposed to be invariant. Let (f2 a ,/Lt a ) be a symbolic 
model of (X, T) , relative to the partition a of the space X and (p a (x) the 
symbolic string associated to any point x G X . 

Definition 27 (q-entropy relative to a partition). As above let I : A* — * N 
be an information measure, either AIC or Iz- Let a be a partition of X and 
I(x, a, n) = I(uj n ), where to — ip a {x). Then we call q-entropy relative to the 
partition a 

h q {X,a) = limsup / /(x,a,n) dpi a . (7) 

In the example of the family of Manneville maps, that is the simplest model 
of intermittent weak chaos, the average with respect to the Lebesgue measure 
of the information plays a crucial role in the classification of the maps in the 
family. Following this example, we believe that the following index is a particu- 
larly meaningful indicator in the study of intermittent weakly chaotic dynamical 
systems. 
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Definition 28 (Intermittent Chaos index). We call intermittent chaos in- 
dex of the dynamical system (A, T, p) with respect to a partition a the number 

q{X, T, a) = inf{p > | h v (X, a) = 0} G [0, 1] . 

The indexes q AIC and q z are defined as above. 

Corollary 29. For all compression algorithms Z , 

q z (X, T, a) > q AIC (X, T, a) . 

In the next section we will apply these definitions to the family Manneville 
maps (for the definition see the next section) choosing the LZ11 compression 
algorithm ( Iplj , [p2| ) and a generating partition a. As a result it holds that in 
the weakly chaotic case (for z > 2), 

<lLzrr{ x , T , a ) = Qaic( x > t ,^) = ■ 

6 Compression algorithms 
6.1 The algorithm LZ77 

The Ziv-Lempel compression scheme LZ77 with infinite window (PH) is the 
one from which almost all practical adaptive dictionary encoders derived ( B ) . 
A dictionary of an input string is the set of words (i.e. group of consecutive 
symbols) in which the algorithm parses the input string. 

The essence is that phrases (i.e. sets of consecutive words in the string to 
be encoded) are replaced with a pointer to where they have occurred earlier in 
the input string. Novel words and phrases can also be constructed from parts 
of earlier words. 

In the LZ77 compression algorithm, the new word is defined as a pair 
(pointer, symbol). The pointer is referred to a phrase contained in the part 
of the input string which precedes the current position of the front end. As 
an example, let the alphabet A be the set {a\, . . . , a r } and consider an input 
string uj E A* . As usual, uj n = (u>i ■ ■ -uj n ) is the substring of oj of length n and 
containing its first n symbols. 

Consider some step h of the coding procedure. 

Suppose the first p symbols (ui ■ ■ ■ uj p ) have already been encoded. The 
dictionary now contains h words {ei, . . . , e^}. Thus, the current position of 
the front end is the (p + l) th site in the input string and the next word in the 
dictionary will be labelled as the (h + \) th word Ch+i- 

The algorithm selects this new word as the longest word which can be ob- 
tained by adding a single character a chosen in the alphabet A to a phrase p 
contained in the substring (u>i ■ ■ ■ lj p -2)- Hence, the word et+i has as a prefix 
the phrase p followed by a as an ending symbol (e/j+i = p a). 

Once the new word eh+i has been found, the algorithm outputs a binary 
encoding of the triplet (sh+i, lh+i, a) where Sh+i is the starting position of the 



18 



prefix p of the new word in the string (ui\ ■ ■ - top), lh+i is the length of the new 
word €h+\ and the symbol a from the alphabet is the last character of eh+i- 

The following example shows how the algorithm LZ77 encodes the input 
stream 

to = (aababbbbaababba . . . ) . 
Let A — {a, b} be the source alphabet. 

The output is the binary encoding of the following triplets. The first column 
is the dictionary index number of the codeword whose triplet is showed in the 
same line, second column. For an easier reading, we add a third column which 
shows each encoded word in the original stream s, but we remark that it is not 
contained in the output file: 



1 


(1,1, 


'a') 


[a] 


2 


(1,2, 


>b>) 


[ab] 


3 


(2,3, 


>b>) 


[abb] 


4 


(5,3, 


'a') 


[bba] 


5 


(2,5, 


'a') 


[ababba] 



and so on. 

Now we will recall some results from [^6| , concerning the optimality of the 
LZ77 algorithm. 

Lemma 30. Let t denote the number of words in which the algorithm LZ77 
parses the string u n . Set m — n — N where N = maxj = i ) „ r rij, where rij is the 
number of occurrences of the symbol di S A. Then t < 2(m + 1). 

Theorem 31. Let t denote the number of words in which the algorithm LZ77 
parses the string ui n . 

(i) For all k > 1 it holds 

lLZ7r(u n ) = \LZ77(LJ n )\ < n J ff fc (^")+3tlog 2 (f) + 

(8) 

+ 0((fc-l) t + *log a log 3 (?)) . 

(ii) The algorithm LZ77 is optimal and for all n > I, for all to 71 € A n and for 
all k > 1 it holds 

!hji^1< g fc K) + of Y?! n) + T^y (9) 

n \ log 2 (n) log 2 (n) J 

(Hi) The algorithm LZ77 is 8- asymptotically optimal with respect to Hi and 
for any string Lo n such that H\{uj n ) ^ it holds 

hz77(u n ) < 8 n + o(tlog 2 log 2 (^)) . (10) 
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6.2 LZ11 on the Manneville map 



Now we are ready to prove the following theorem, which links the Information 
Content obtained via the algorithm LZ11 to the AIC on the symbolic orbits 
of the Manneville map. We will study the dynamical system ([0,1], 7^) where 
T z (x) — x + x z (mod 1) and z > 1. The reference measure is the Lebesgue 
measure on the unit interval. 

The Manneville map was introduced by P. Manneville in |2£| as an example 
of a discrete dissipative dynamical system with intermittency: there is an alter- 
nation between long regular phases, called laminar, and short irregular phases, 
called turbulent. This behavior has been observed in fluid dynamics experiments 
and in chemical reactions. 

In order to state and prove our results, we recall some useful lemmas coming 
from probability theory. 

Lemma 32. (Jensen's Inequality) Let I be a closed interval in M. and 
u : I > R be convex and continuous at the endpoints of I. If X is a ran- 
dom variable which takes its values in I, then E[it o X] > u(E[X]) . 

Lemma 33. If X and Y are two real random variables s.t. X > and Y > 0. 

then 

E[max{X,F}] > max{ELY],E[Y"]} . 

Theorem 34. Consider the dynamical system ([0,1], T z ) driven by the Man- 
neville map T z (x) — x + x z (mod 1), with z > 1. Let x S (0, 1) be such that 
T z (x) = 1. Consider the partition a = {[0, x], (J, 1]} of the unit interval [0, 1]. 
If to is a symbolic orbit drawn from the Manneville map, with respect to the 
partition a, then 

E[J iZ77 «)] ~ n ifz<2 

(11) 

0(nP) < E[I L z7 7 (u n )] < 0(nP\og 2 (n)) if z > 2 

where p = -^zj and the measure is the usual Lebesgue measure on the interval. 

Proof. We will study the two cases separately. 

If z < 2: a result of [EU shows that for the expectation value of the AIC of 
a symbolic orbit of the Manneville map with z < 2, with respect to the 
Lebesgue measure on the interval, it holds that 

EL4/CV 1 )] - n . 

Since it is AIC(uj n ) < lLzn{u n ) < 0(n), then in this case we have that 
E[4m(w")] ~ n. 

If z > 2: From Theorem ||l] {Hi), we know that the algorithm LZ77 is 
asymptotically optimal with respect to Hi, that is 

iLzrriu") < 8 n Hi{u n ) + o(i log 2 log 2 (£)) , (12) 
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where t is the number of words in the LZ11 parsing of ui n . Thus, for any 
sequence u> n , it holds 

E[I LZ77 (uj n )} < 8 n E^K')] + E[ilog 2 log 2 Q] . (13) 

First, we will prove that n ¥,[Hi(ui n )] is bounded by 0(n p log 2 (n)) with 
p = — Then, we will give an estimate for E[ilog 2 log 2 (j)], so completing 
the proof. 

By definition, H x (w n ) = -(^log 2 (^) + (l-^)log 2 (l-^)) where N n 
is the number of occurrences of the event £ = {passage through (x, 1]}. 

In Jl5| it has been proved that if z > 2 then E[iV„] ~ n p . 

Therefore, for the first order empirical entropy of a symbolic orbit drawn 
by the Manneville map with respect to the Lebesgue measure, we can 
apply Jensen's inequality and obtain: 



E[ffi(w n )] < 



< — E 



N n 



log 2 (^) +E (l-^)log 2 (l-^) 



1 -E 



Nn 



log 2 ( 



1 — E 



N„ 



< 



l^nP- 1 log 2 (™ p_1 ) + (l - nP- 1 ^ log 2 (l - n?- 1 ^ 

Consequently, we can easily verify that 
8 n E[H 1 (uj n )} < - 8 nPlog 2 (n p_1 ) - 8 (n - n p ) log 2 (l - n^" 1 ) 

= 8 nPlog 2 (n 1 -* - l) - 8 n log 2 (l - n^ 1 ) . 
For the fact that p < 1, it holds 

8 n p log 2 (n 1 " 33 - l) - 8 n log 2 (l - nP- 1 



log 2 (n) 



1 



(14) 



0(n"log 2 (n)) 



Now we will prove that, for cj™ a symbolic orbit of the Manneville map 
with z > 2, if t is the number of words in the LZ77 parsing of u> n , it holds 



E 



ilog 2 log 2 (j)] < 0(nflog 2 (n)) 



(15) 
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We apply Lemma [?0|, with alphabet A = {0, 1} where 1 is the symbol 
associated to the event £ = {passage through [x, 1]}, which appears in 
uj n with mean probability ^ = n v ~ x and is the symbol associated to 
the event not £. 

Thus, m — n — max{no, n\\ and t < 2(m + 1) = 2(n — max{no, n%} + 1) . 
Thanks to Lemma |33|, we obtain the following estimates: 

E[t] < 2 E[r] + 1 = 

= 2 n + 1 - 2 E[max{n , < 

< 2n+l-2 max{E[n ],E[ni]}= (16) 

= 2 rt + 1 — 2 max{n — n p , n p } = 

= 2 n p + 1 . 

Eventually, the inequality ( |l5| ) can be easily verified. 

From the estimates (Q) and (|l5| ) together with (|l3|), it follows that 

E[JiZ77(w n )]<0(» p loga(n)) ■ 

Finally, since the AIC is the ideal information content of a string, it is 
lLzn{u n ) > ATC(o;") and the same inequality relates the expectation 
values. 

In @ and @, it has been proved that E[AIC(uj n )} > 0(n p ). This com- 
pletes the proof. 

□ 



6.3 The algorithm CASToRe 

We have created and implemented a particular compression algorithm we called 
CASToRe which is a modification of the well known LZ78 algorithm (f32|). 
Its theoretical advantages with respect to LZ78 are shown in 0, j^j: it is a 
sensitive measure of the Information content of low entropy sequences. That's 
why is called CASToRe: Compression Algorithm, Sensitive To Regularity. 

As it has been proved in Theorem 4.1 in Q, the Information Iz of a constant 
sequence s™, originally with length n, is 4 + 21og(n + l)[log(log(n + 1)) — 1], 
if the algorithm Z is CASToRe. The theory predicts that the best possible 
information for a constant sequence of length n is AIC(s n ) = log(n)+const. 
In O, it is shown that the algorithm LZ78 encodes a constant n digits long 
sequence to a string with length about const + bits; so, we cannot expect 
that LZ7S is able to distinguish a sequence whose information grows like n a 
(a < from a constant or periodic one. This motivates the choice of using 
CASToRe. 

Now we briefly describe the internal running of CASToRe. 

As the algorithm LZ77, the algorithm CASToRe is based on an adaptive 
dictionary (0). One of the basic differences in the coding procedure is that 
the algorithm LZ77 splits the input strings in overlapping words, while the 
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algorithm CASToRe (as already the algorithm LZ78) parses the input string in 
non-overlapping words. 

At the beginning of encoding procedure, the dictionary is contains only the 
alphabet. In order to explain the principle of encoding, let's consider a step 
h within the encoding process, when the dictionary already contains h words 
{ei, . . .,e h ). 

The new word is defined as a pair {prefix pointer, suffix pointer). The two 
pointers are referred to two (not necessarily different) words p p and p s chosen 
among the ones contained in the current dictionary as follows. First, the al- 
gorithm reads the input stream starting from the current position of the front 
end, looking for the longest word p p matching the stream. Then, we look for 
the longest word p s such that the joint word p p p s matches the stream. The new 
word eh+i which will be added to the dictionary is then eh+i = p P p s - 

The output file contains an ordered sequence of the binary encoding of the 
pairs (i p , i s ) such that i p and i s are the dictionary index numbers corresponding 
to the prefix word p p and to the suffix word p s , respectively. The pair (i p , i s ) is 
referred to the new encoded word eh+i and has its own index number ih+i- 

The following example shows how the algorithm CASToRe encodes the input 
stream 

lo = (abcababccabb . . .). 
Let the source alphabet be A — {a, b, c}. 

The output is the binary encoding of the following pairs contained in the 
second column. The first column is the dictionary index number of the encoded 
word in the dictionary which is showed in the same line, second column. For an 
easier reading, we add a third column which shows each encoded word in the 
original stream u>, but it is not contained in the output file: 





First, the dictionary is being loaded 




f 


(0, 'a ' ) 


[a] 


2 


(0, 'b ' ) 


[b] 


3 


(0, 'c ' ) 


[c] 




Then, the encoding procedure starts 




4 


(1,2) 


[ab] 


5 


(3,4) 


[cab] 


G 


(4,3) 


[abc] 


7 


(5,3) 


[cabc] 



and so on. 

We remark that this coding procedure, which pairs words already in the 
dictionary to create a new word, is similar to the procedure that can be found 
in the recent work which seems to be able to give a very precise entropy 
estimation, detecting very long range correlations in the English language. 
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7 Numerical experiments 



In this section we show some numerical experiments supported by the theory of 
the previous sections. We consider some examples of fully chaotic and weakly 
chaotic dynamical systems and we measure the information content of the gen- 
erated symbolic orbits with respect to some partition. 

We measure the information content with the two different data compression 
algorithms LZ77 with infinite window and CASToRe, whose internal running 
has been presented above. These two algorithms seems to be suitable for the 
compression of 0-entropy strings (see Q and Proposition [si]) and they are fast 
enough to allow the compression of long strings (we could manage to compress 
trajectories of O(10 7 ) symbols). From the computational point of view, whereas 
LZ77 requires a big amount of RAM (Random Access Memory), since it needs 
to retain the entire string already encoded and the entire dictionary built, the 
algorithm CASToRe only remembers the dictionary which is implemented in a 
tree structure. Hence, the computation time is a distinguishing feature between 
the two algorithms: CASToRe can compress a O(10 7 )-symbols string in few 
seconds. 

As we will see the results agree with theoretical predictions when they are 
available or with other numerical results which can be found in the literature. 

It is worth to remark that, even if the two compression schemes are basically 
different, the two algorithms give a behavior of the information content of the 
same order in all the numerical experiments we performed. 



7.1 The Manneville map 

We measure the information content of symbolic orbits drawn from the Man- 
neville map. 

Let us consider again the Manneville map T z {x) — x + x z (mod 1) as it has 
been presented in Section |6.2| . 

Let us consider the partitions a\,a 2 obtained by dividing [0,1] in 2 or 4 
subintervals. The partition u\ is the same described in Section |6.2| and a 2 is 
a refinement of a±: a 2 is obtained splitting in two equal parts each interval of 
o.\. We denote by K(x,cti,n),i £ {1,2} the Algorithmic Information Content 
of a n-long symbolic orbit of the Manneville map with initial condition x, with 
respect to the partition a%. 

By the results exposed in (|2l|l,||, @) we have that the mean value of 
K(x 1 ai,n), with respect to the Lebesgue measure, on the initial conditions of 
the orbit is expected to be E[K(x, cti, n)) ~ n p , with p — -^j for z > 2, and 
M[K(x, ai,n)] r~j n for z < 2. Moreover (by the results of section [T^) the same 
result holds for the information as it is measured by the algorithm LZ77. We 
verified numerically this statement, and the result is shown in figure [[]. 

If we use the information content as it is measured by CASToRe the numer- 
ical result is also close to the previous one. This confirms the theoretical results 
and proves that the methods relative to the Computable Information Content 
are experimentally reliable. 
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We considered a set of one hundred initial points, then we generated the rela- 
tive 10 7 -long orbits and we applied the compression algorithms to the associated 
symbolic strings s (with respect both to the partition ati and o^). 

In Table |j] we show the results. The first column indicates the partition to 
which the results are referred. The second column is the value of the parameter 
z which drives the dynamics of the system. The last column gives the results of 
the theory for the exponent p of the asymptotic behavior of K(x,oti,n) ~ n p . 
The third and the fourth columns show the experimental results. The shown 
number is the average p of the exponents of one hundred different orbits. The 
initial conditions of the orbits are chosen randomly with respect to the Lebesgue 
measure. 



Symbols 


z 


LZ77 


CASToRe 


Theoretical value 


4 


2.5 


0.64 


0.64 


0.66 


2 


2.5 


0.64 


0.64 


0.66 


4 


3 


0.49 


0.43 


0.5 


2 


3 


0.47 


0.48 


0.5 


4 


4 


0.27 


0.25 


0.33 


2 


4 


0.32 


0.28 


0.33 



Table 1: Theoretical and experimental results for the Information content of the 
Manneville map 



In Figure [j] are plotted several examples of the behavior of the Information 
Content I z when Z =LZ77 (on the right) or Z =CASToRe (on the left) and 
for different values of the parameter z. The scale is bilogarithmic, so that the 
power laws become straight lines and the exponent p of the expected power law 
is the slope of the correspondent straight line. 

7.2 The logistic map 

In this section we study the logistic map at the chaos threshold from an expe- 
rimental point of view. We recall that the logistic map is defined by 

f(x) = Xx(l - x) , £€[0,1], 1<A<4. (17) 

The logistic map has been used to simulate the behavior of biological species 
not in competition with other species. Later the logistic map has also been pre- 
sented as the first example of a relatively simple map with an extremely rich dy- 
namics (|0|,]l3)). If we let the parameter A vary from 1 to 4, we find a sequence 
of bifurcations of different kinds. For values of A < Aoo = 3.56994567187 . . . , 
the dynamics is periodic and there is a sequence of period doubling bifurcations 
which leads to the chaos threshold for A = A^ . 
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Figure 1: The plot of the above experiment: each graph represents the behavior 
of the information plotted versus the number of steps (in log-log scale). Note 
that in log-log scale power laws become straight lines. On the left we plot the 
information as it is measured by CASToRe, on the right we plot the information 
as it is measured by LZ77. 



Numerical experiments and heuristics considerations from the physical liter- 
ature indicate that at the chaos threshold there is a power-law "sensitivity" to 
initial conditions (here the sensitivity to initial conditions was measured with a 
generalized Lyapunov exponent). These facts justified the application of gener- 
alized entropies to the map (|30|). 

Moreover by the more recent results of Q we know that if we consider 
the Lebesgue measure, then for almost any initial condition the Algorithmic 
Information Content of an orbit will increase as the logarithm of the number of 
steps. 

In the following, we will show how we have experimentally confirmed this 
result measuring the information with LZ77 and CASToRe. 

In figure || the main plot is in bilog scale, while the inset is in log-linear 
scale and the same graphs as in the main plot are pictured. On the left, the 
experiments performed via CASToRe, on the right via LZ77. The analysis of 
results is the same for both pictures. 

The solid line in the main plot represents the information behavior at the 
chaos threshold. This graph already indicates that at the chaos threshold Aoo the 
information increases below any power law (any power law becomes a straight 
line when plotted in bilog scale and our graph is evidently concave), as predicted 
by the theory. A more accurate quantitative analysis was done in [§. The 
upper lines are referred to values of A > Aoo for which the map is chaotic 
(the information increases linearly with time). The lower lines represents the 
information behavior when A tends to Aoo from below (along the period doubling 
cascade), hence when we are in the periodic regime (where the Algorithmic 
Information Content is expected to behave logarithmically). 
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Figure 2: On the left we have the information v.s. number of steps for a typical 
point of the interval as it is measured by CASToRe, on the right the same with 
LZ77. The plot is in log-log scale, while in the inset the plot is log-linear to show 
how the long time behavior of the information follows a logarithmic increase. 

7.3 Tirnakli, Tsallis, Lyra (TTL)-circular like maps 

These maps have been introduced in p9[ , where they are studied numerically, 
as modifications of the classical standard map. These maps are one-dimensional 
maps and varying the parameters they show a transition to chaos. They are 
defined by 

T z (x) = tt z + (x - — sin(27ra;))^ (mod 1) 

and we study the maps with parameters values z — 3, z — A, z — h with Q 3 = 
0.606661063469, fl A = 0.648669091983, fl 5 = 0.6788311756505, for which values 
the maps are at the onset of chaos. 

We recall that for z = 3 we obtain the classical the standard map 

K 

T(x) = il + (x sin(27ra)) [mod 1) 

2tt 

with K = 1 (at the edge of a quasiperiodic transition to chaos). 

For these maps results in p9| show a numerical evidence of power law initial 
data sensitivity, as it was shown in ]3(J for the logistic map at the edge of chaos. 
Also, by the results of this would correspond to a logarithmic increase of 

the algorithmic information. We measured the information coming from these 
maps, obtaining a behavior that is also similar to the logistic map at the edge 
of chaos and fits with the cited numerical results (Figure ||) . 

7 In the cited paper, quantitative results are proved between initial condition sensitivity 
and complexity. 
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Figure 3: The information vs. number of steps for the TTL-circular like maps 
for a typical point studied using the algorithm CASToRe. The solid line is 
referred to the map for z=l with 2 or 4 symbols, fr the dashed and dotted curves, 
we have z=4 and z—5. Inset: same graph in log-linear scale. 

7.4 Casati-Prosen map 

This area-preserving map has been proposed in || as a model of quantum chaos. 
The map is defined on T 2 = [-1, 1) x [-1, 1) by T ( ® n ^ = f * n+1 where 

x n+ i = x n + y n+ i 

y n+ i = y n +a sgn(x„) + f3 

and a = isLJL-Q — sl^ p = ( 2 (_^_ 1 )+e) _ Results in ^ provide numerical evi- 
dences that the map is ergodic and mixing, with linear speed of separation of 
nearby starting orbits. 

We studied the complexity of some trajectory of the system, obtaining that 
the computable information seems to increase as a power law n p with expo- 
nent p approximately equal to 0.75 ... (p — 0.742 when estimated by LZ77 
and p = 0.755 when estimated by CASToRe). This result is quite unexpected 
from the connection between sensitivity to initial conditions and the asymptotic 
behavior of the information content (p0[). From the cited results, to a linear 
initial condition sensitivity would correspond a logarithmic increase of the AIC. 
However, the rigorous proof of all the properties of the map remains an open 
problem. 
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Figure 4: The information vs. number of steps (for CASToRe, on the left, and 
LZ11, on the right) for a typical point in the Casati-Prosen map. The plot is in 
log-log scale. 



7.5 The Arnold cat map 

The Arnold cat map is an example of a two-dimensional hyperbolic toral auto- 
morphism, that is the projection on the two-torus K 2 /Z 2 of a linear map of R 2 , 
represented by a matrix M with integer elements and determinant one and real 
eigenvalues A and 1/A, different from 1. The Arnold cat map is specified by the 
matrix 



with A = (3 + V2)/2. From a theorem of H it follows that the Kolmogorov-Sinai 
entropy with respect to the Lebesgue measure of a two-dimensional hyperbolic 
toral automorphism is given by the logarithm of the modulus of the eigenvalue 
bigger than 1. Hence, in this case 

h = log 2 3 ~ 1.388... 

Our computations give the same result. We only show the results for the com- 
pression algorithm CASToRe, since in this case no evident differences can be 
appreciated. Studying the behavior of the information content with respect to 
the length of the compressed string, we expect to find a straight line with angu- 
lar coefficient equal to the entropy of the dynamical system, when working with 
a generating partition. In figure || it is represented the information function 
for three different choices of the partition. The dotted line is obtained with a 
partition of the square [0, 1) x [0, 1) in two horizontal strips. The solid line is 
obtained with a partition in four equal squares along the axis, and the dashed 
line is obtained with a partition in four squares along the eigen-directions as- 
sociated to the matrix M. The angular coefficients of the lines are 0.98 for the 
dotted line, 1.56 for the solid line and 1.37 for the dashed line, showing that the 
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first two partitions considered are not able to simulate the whole complexity of 
the system, whereas the last one can be considered to be a generating partition. 




Figure 5: The information content of the Arnold cat map for three different par- 
titions. The dashed line is the information content of the generating partition. 
For the partitions used see the description in the text. 



7.6 The Froeschle map 



This map was studied in 16 as an example of a symplectic map for which it is 
possible to find the integrable and non-integrable initial conditions. The map 
is defined on the two-torus R 2 /(27rZ) 2 by 

x n+ i = x n + asmy n 
y n+ i = x n +y„+a sin y n 

with a — 1.3. We studied the behavior of the information content for orbits 
generated by two different initial conditions, one corresponding to the regular 
zone, and the other to the irregular zone. Both orbits have been studied with 
CASToRe and with LZ77. In the regular zone, the initial condition is given 
by the point (0,2.5). From the results in figure |6| (dotted curves), one can 
see that both the compression algorithms give indication of regularity by an 
increasing of the information content of the order of a logarithm (this behavior 
is visible clearly in the inset in the two figures, where the information content is 
plotted on a log- linear scale) . The two compression algorithms also give a strong 
indication of full chaos for the irregular orbit, generated by the initial condition 
(2,0). In figure ^|, the information content is a straight line (solid and dotted 
lines, with partitions in two vertical strips and in four equal squares) whose 
angular coefficients give an indication of the value 0.40 and 0.44, respectively, 
of the Kolmogorov- Sinai entropy with respect to the measure associated to the 
initial point. These results are in agreement with those of 0|. 
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Figure 6: Information content for the Froeschle map. On the left the results 
are obtained using CASToRe, on the right using LZ11. In both the pictures, the 
solid and dashed lines are for the full chaotic orbit, and the dotted curves are 
for the regular orbit. In the insets only the regular orbit is plotted in a log-linear 
scale. 
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